Profile-Based Focused Crawling for Social Media-Sharing Websites
نویسندگان
چکیده
We present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing websites. In this system, we treat the user profiles as ranking criteria for guiding the crawling process. Furthermore, we divide a user’s profile into two parts, an internal part, which comes from the user’s own contribution, and an external part, which comes from the user’s social contacts. In order to expand the crawling topic, a cotagging topic-discovery scheme was adopted for social media-sharing websites. In order to efficiently and effectively extract data for the focused crawling, a path string-based page classification method is first developed for identifying list pages, detail pages, and profile pages. The identification of the correct type of page is essential for our crawling, since we want to distinguish between list, profile, and detail pages in order to extract the correct information from each type of page, and subsequently estimate a reasonable ranking for each link that is encountered while crawling. Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and online page importance computation (OPIC) crawlers, when crawling the Flickr website for two different topics.
منابع مشابه
Personalized Recommendation in Social Media: a Profile Expansion Approach
With the success of Web 2.0 applications, various social media websites have been established and become tremendous assets for supporting critical business intelligence applications. The knowledge gained from social media websites can not only meet the objectives of businesses offering them but also help the development of novel and effective services that are better tailored to users’ needs. I...
متن کاملAccurate and Efficient Crawling for Relevant Websites
Focused web crawlers have recently emerged as an alternative to the well-established web search engines. While the well-known focused crawlers retrieve relevant webpages, there are various applications which target whole websites instead of single webpages. For example, companies are represented by websites, not by individual webpages. To answer queries targeted at websites, web directories are...
متن کاملEfficient Social Website Crawling Using Cluster Graph ; CU-CS-1056-09
Online social communities have gained significant popularity in recent years and have become an area of active research. Compared with general websites or well-structured Web forums, user-centered social websites pose several unique challenges for crawling, a fundamental task for data collection and data mining of large-scale online social communities: (1) Social websites have more complex link...
متن کاملEfficient Social Website Crawling Using Cluster Graph
Online social communities have gained significant popularity in recent years and have become an area of active research. Compared with general websites or well-structured Web forums, user-centered social websites pose several unique challenges for crawling, a fundamental task for data collection and data mining of large-scale online social communities: (1) Social websites have more complex link...
متن کاملSocial Sharing Behavior Under E-Commerce Context
In the era of Web 2.0, social networking sites play an important role in generating online information. People create billions of shares such as web pages or videos with friends on these sites every month. The share on the social networking site is visible to all of her or his friends and could be clicked by them and generates traffic back to the website where the information is from. In order ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- EURASIP J. Image and Video Processing
دوره 2009 شماره
صفحات -
تاریخ انتشار 2009